-
Notifications
You must be signed in to change notification settings - Fork 12.8k
OpenCL: add initial FA support #14987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@rmatif Very cool, thank you! |
Sorry, got distracted during the past week. Will come back to this asap. |
It seems to help small models like qwen2.5-0.5b, qwen2.5-0.5b-Q4_0A750ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.45.02.16
A830ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: 0800.35 Compiler E031.47.18.28
|
Current implementation works well for small models (e.g., qwen2.5-0.5B), significantly improving pp performance. For larger models, larger configs (e.g., We will use this implementation as the baseline and do further investigations and improvements. |
This PR introduces F16/F32 FA support for the OpenCL backend. It has been extremely challenging to achieve good performance on this kind of hardware, but I believe it is now decent enough to serve as a baseline that we can further iterate on. I also believe there is room for improvement for tg
Results on Adreno 830:
Adreno 750: